Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/hpc setup task #47

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from
Open

Feature/hpc setup task #47

wants to merge 19 commits into from

Conversation

rlskoeser
Copy link
Contributor

attempt to implement #44

@rlskoeser rlskoeser changed the base branch from develop to feature/script-update-model February 4, 2025 23:06
@rlskoeser rlskoeser force-pushed the feature/hpc-setup-task branch from 28e1d3b to 98bcf30 Compare February 5, 2025 16:54
@cmroughan
Copy link
Collaborator

cmroughan commented Feb 5, 2025

Checking in admin for the results of a new user's train task -- the task looks to have completed successfully, with the model indeed uploaded to eScr. The task report messaging shows that we did hit an error -- am I remembering correctly that right now the setup script is installing a different branch of htr2hpc? Maybe that's causing the disconnect:

Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
Failed to load _MLCPUComputeDeviceProxy: No module named 'coremltools.libcoremlpython'
Failed to load _MLGPUComputeDeviceProxy: No module named 'coremltools.libcoremlpython'
Failed to load _MLNeuralEngineComputeDeviceProxy: No module named 'coremltools.libcoremlpython'
Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
Failed to load _MLComputePlanProxy: No module named 'coremltools.libcoremlpython'
Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
Failed to load _MLModelAssetProxy: No module named 'coremltools.libcoremlpython'
WARNING:py.warnings:/home/wh4213/.conda/envs/htr2hpc/lib/python3.11/site-packages/PIL/Image.py:2926: RuntimeWarning: divide by zero encountered in divide
  As = 1.0 / w

(See in admin the report for Task 4296.)

Update:

I attempted to replicate on my user by running the htr2hpc update task and then running a train task, but I did not get the missing coremltools error. Probably because it is already installed in my htr2hpc conda env and reinstalling the htr2hpc package did nothing to change that. But we will want to pin down why coremltools is not getting set up by default.

Base automatically changed from feature/script-update-model to develop February 6, 2025 14:53
@rlskoeser rlskoeser requested a review from cmroughan February 10, 2025 21:21
@cmroughan
Copy link
Collaborator

Commit d9f693b addresses the issues encountered by Wouter and myself that were leading to the coremltools warnings above. This problem was caused by the below error, encountered when creating the env from the environment_cuda.yml file:

Pip subprocess error:
ERROR: file:///. (from -r /tmp/condaenv.krtta1mm.requirements.txt (line 2)) does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.


CondaEnvException: Pip failed

I noticed this error was only occurring when trying to do conda env create from inside your shared kraken directory, so the fix simply copies that kraken directory to the user's own local scratch, runs the necessary code, and then deletes the user's copy after completion.

@rlskoeser
Copy link
Contributor Author

ah! it makes sense there could be downsides to using the shared directory, that isn't a common setup (and I probably wouldn't get the error since I own those files). Thanks for the good sleuthing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Under Review
Development

Successfully merging this pull request may close these issues.

2 participants